39 research outputs found
BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages
We present BPEmb, a collection of pre-trained subword unit embeddings in 275
languages, based on Byte-Pair Encoding (BPE). In an evaluation using
fine-grained entity typing as testbed, BPEmb performs competitively, and for
some languages bet- ter than alternative subword approaches, while requiring
vastly fewer resources and no tokenization. BPEmb is available at
https://github.com/bheinzerling/bpem
Aspects of Coherence for Entity Analysis
Natural language understanding is an important topic in natural language proces-
sing. Given a text, a computer program should, at the very least, be able to under-
stand what the text is about, and ideally also situate it in its extra-textual context
and understand what purpose it serves. What exactly it means to understand what a
text is about is an open question, but it is generally accepted that, at a minimum, un-
derstanding involves being able to answer questions like “Who did what to whom?
Where? When? How? And Why?”. Entity analysis, the computational analysis of
entities mentioned in a text, aims to support answering the questions “Who?” and
“Whom?” by identifying entities mentioned in a text. If the answers to “Where?”
and “When?” are specific, named locations and events, entity analysis can also pro-
vide these answers. Entity analysis aims to answer these questions by performing
entity linking, that is, linking mentions of entities to their corresponding entry in
a knowledge base, coreference resolution, that is, identifying all mentions in a text
that refer to the same entity, and entity typing, that is, assigning a label such as
Person to mentions of entities.
In this thesis, we study how different aspects of coherence can be exploited to
improve entity analysis. Our main contribution is a method that allows exploiting
knowledge-rich, specific aspects of coherence, namely geographic, temporal, and
entity type coherence. Geographic coherence expresses the intuition that entities
mentioned in a text tend to be geographically close. Similarly, temporal coherence
captures the intuition that entities mentioned in a text tend to be close in the tem-
poral dimension. Entity type coherence is based in the observation that in a text
about a certain topic, such as sports, the entities mentioned in it tend to have the
same or related entity types, such as sports team or athlete. We show how to integrate
features modeling these aspects of coherence into entity linking systems and esta-
blish their utility in extensive experiments covering different datasets and systems.
Since entity linking often requires computationally expensive joint, global optimi-
zation, we propose a simple, but effective rule-based approach that enjoys some of
the benefits of joint, global approaches, while avoiding some of their drawbacks.
To enable convenient error analysis for system developers, we introduce a tool for
visual analysis of entity linking system output. Investigating another aspect of co-
herence, namely the coherence between a predicate and its arguments, we devise a
distributed model of selectional preferences and assess its impact on a neural core-
ference resolution system. Our final contribution examines how multilingual entity
typing can be improved by incorporating subword information. We train and make
publicly available subword embeddings in 275 languages and show their utility in
a multilingual entity typing tas
Cross-stitching Text and Knowledge Graph Encoders for Distantly Supervised Relation Extraction
Bi-encoder architectures for distantly-supervised relation extraction are
designed to make use of the complementary information found in text and
knowledge graphs (KG). However, current architectures suffer from two
drawbacks. They either do not allow any sharing between the text encoder and
the KG encoder at all, or, in case of models with KG-to-text attention, only
share information in one direction. Here, we introduce cross-stitch
bi-encoders, which allow full interaction between the text encoder and the KG
encoder via a cross-stitch mechanism. The cross-stitch mechanism allows sharing
and updating representations between the two encoders at any layer, with the
amount of sharing being dynamically controlled via cross-attention-based gates.
Experimental results on two relation extraction benchmarks from two different
domains show that enabling full interaction between the two encoders yields
strong improvements
Fine-Grained Entity Typing in Hyperbolic Space
How can we represent hierarchical information present in large type
inventories for entity typing? We study the ability of hyperbolic embeddings to
capture hierarchical relations between mentions in context and their target
types in a shared vector space. We evaluate on two datasets and investigate two
different techniques for creating a large hierarchical entity type inventory:
from an expert-generated ontology and by automatically mining type
co-occurrences. We find that the hyperbolic model yields improvements over its
Euclidean counterpart in some, but not all cases. Our analysis suggests that
the adequacy of this geometry depends on the granularity of the type inventory
and the way hierarchical relations are inferred.Comment: 12 pages, 4 figures, final version, accepted at the 4th Workshop on
Representation Learning for NLP (RepL4NLP), held in conjunction with ACL 201
Test-time Augmentation for Factual Probing
Factual probing is a method that uses prompts to test if a language model
"knows" certain world knowledge facts. A problem in factual probing is that
small changes to the prompt can lead to large changes in model output. Previous
work aimed to alleviate this problem by optimizing prompts via text mining or
fine-tuning. However, such approaches are relation-specific and do not
generalize to unseen relation types. Here, we propose to use test-time
augmentation (TTA) as a relation-agnostic method for reducing sensitivity to
prompt variations by automatically augmenting and ensembling prompts at test
time. Experiments show improved model calibration, i.e., with TTA, model
confidence better reflects prediction accuracy. Improvements in prediction
accuracy are observed for some models, but for other models, TTA leads to
degradation. Error analysis identifies the difficulty of producing high-quality
prompt variations as the main challenge for TTA.Comment: 12 pages, 4 figures, accepted to EMNLP 2023 Findings (short paper
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Large language models (LLMs) have been shown to be able to perform new tasks
based on a few demonstrations or natural language instructions. While these
capabilities have led to widespread adoption, most LLMs are developed by
resource-rich organizations and are frequently kept from the public. As a step
towards democratizing this powerful technology, we present BLOOM, a
176B-parameter open-access language model designed and built thanks to a
collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer
language model that was trained on the ROOTS corpus, a dataset comprising
hundreds of sources in 46 natural and 13 programming languages (59 in total).
We find that BLOOM achieves competitive performance on a wide variety of
benchmarks, with stronger results after undergoing multitask prompted
finetuning. To facilitate future research and applications using LLMs, we
publicly release our models and code under the Responsible AI License